A population genetic hidden Markov model for detecting genomic regions under selection.
نویسندگان
چکیده
Recently, hidden Markov models have been applied to numerous problems in genomics. Here, we introduce an explicit population genetics hidden Markov model (popGenHMM) that uses single nucleotide polymorphism (SNP) frequency data to identify genomic regions that have experienced recent selection. Our popGenHMM assumes that SNP frequencies are emitted independently following diffusion approximation expectations but that neighboring SNP frequencies are partially correlated by selective state. We give results from the training and application of our popGenHMM to a set of early release data from the Drosophila Population Genomics Project (dpgp.org) that consists of approximately 7.8 Mb of resequencing from 32 North American Drosophila melanogaster lines. These results demonstrate the potential utility of our model, making predictions based on the site frequency spectrum (SFS) for regions of the genome that represent selected elements.
منابع مشابه
Predicting CpG Islands and Their Relationship with Genomic Feature in Cattle by Hidden Markov Model Algorithm
Cattle supply an important source of nutrition for humans in the world. CpG islands (CGIs) are very important and useful, as they carry functionally relevant epigenetic loci for whole genome studies. As a matter of fact, there have been no formal analyses of CGIs at the DNA sequence level in cattle genomes and therefore this study was carried out to fill the gap. We used hidden markov model alg...
متن کاملA hidden Markov model for investigating recent positive selection through haplotype structure.
Recent positive selection can increase the frequency of an advantageous mutant rapidly enough that a relatively long ancestral haplotype will be remained intact around it. We present a hidden Markov model (HMM) to identify such haplotype structures. With HMM identified haplotype structures, a population genetic model for the extent of ancestral haplotypes is then adopted for parameter inference...
متن کاملمقایسه روش های مختلف آماری در انتخاب ژنومی گاوهای هلشتاین
Genomic selection combines statistical methods with genomic data to predict genetic values for complex traits. The accuracy of prediction of genetic values in selected population has a great effect on the success of this selection method. Accuracy of genomic prediction is highly dependent on the statistical model used to estimate marker effects in reference population. Various factors such a...
متن کاملAutomated generation and refinement of protein signatures: case study with G-protein coupled receptors
MOTIVATION Previous work had established that it was possible to derive sparse signatures (essentially sequence-length motifs) by examining points of contact between residues in proteins of known three-dimensional (3D) structure. Many interesting protein families have very little tertiary structural information. Methods for deriving signatures using only primary and secondary-structural informa...
متن کاملThe Pattern of Linkage Disequilibrium in Livestock Genome
Linkage disequilibrium (LD) is bases of genomic selection, genomic marker imputation, marker assisted selection (MAS), quantitative trait loci (QTL) mapping, parentage testing and whole genome association studies. The Particular alleles at closed loci have a tendency to be co-inherited. In linked loci this pattern leads to association between alleles in population which is known as LD. Two metr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 27 7 شماره
صفحات -
تاریخ انتشار 2010